{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "2_bqml_adv_feat_eng.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.3-final"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "zxWsQffj1ysi"
      },
      "source": [
        "# Advanced Feature Engineering in BQML\n",
        "\n",
        "## Learning Objectives\n",
        "\n",
        "1. Evaluate the model\n",
        "2. Extract temporal features, feature cross temporal features\n",
        "3. Apply ML.FEATURE_CROSS to categorical features\n",
        "4. Create a Euclidian feature column, feature cross coordinate features\n",
        "5. Apply the BUCKETIZE function, TRANSFORM clause, L2 Regularization\n",
        "\n",
        "\n",
        "## Introduction \n",
        "In this lab, we utilize feature engineering to improve the prediction of the fare amount for a taxi ride in New York City.  We will use BigQuery ML to build a taxifare prediction model, using feature engineering to improve and create a final model. By continuing the utilization of feature engineering to improve the prediction of the fare amount for a taxi ride in New York City by reducing the RMSE.\n",
        "\n",
        "In this Notebook, we perform a feature cross using BigQuery's ML.FEATURE_CROSS, derive coordinate features, feature cross coordinate features, clean up the code, apply the BUCKETIZE function, the TRANSFORM clause, L2 Regularization, and evaluate model performance throughout the process.\n",
        "\n",
        "Each learning objective will correspond to a __#TODO__  in this student lab notebook -- try to complete this notebook first and then review the [solution notebook](https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive2/feature_engineering/solutions/2_bqml_adv_feat_eng.ipynb)\n"
      ]
    },
    {
 "cell_type": "markdown",
 "metadata": {},
 "source": [
  "**Tip: Make sure to delete the \"TODO \" line from the code when you are done and about to run the cell.**"
 ]
},
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "hJ7ByvoXzpVI"
      },
      "source": [
        "### Set up environment variables and load necessary libraries"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Nny3m465gKkY",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "!pip install --user google-cloud-bigquery==3.4.1"
      ],
      "execution_count": 1,
      "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
           "Requirement already satisfied: google-cloud-bigquery==2.34.4 in /opt/conda/lib/python3.7/site-packages (2.34.4)\n",
           "Requirement already satisfied: google-resumable-media<3.0dev,>=0.6.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-bigquery==2.34.4) (2.4.0)\n",
           "Requirement already satisfied: protobuf<4.0.0dev,>=3.12.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-bigquery==2.34.4) (3.20.3)\n",
           "Requirement already satisfied: google-cloud-core<3.0.0dev,>=1.4.1 in /opt/conda/lib/python3.7/site-packages (from google-cloud-bigquery==2.34.4) (2.3.2)\n",
           "Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-bigquery==2.34.4) (2.28.1)\n",
           "Requirement already satisfied: grpcio<2.0dev,>=1.38.1 in /opt/conda/lib/python3.7/site-packages (from google-cloud-bigquery==2.34.4) (1.50.0)\n",
           "Requirement already satisfied: proto-plus<2.0.0dev,>=1.15.0 in /opt/conda/lib/python3.7/site-packages (from google-cloud-bigquery==2.34.4) (1.22.1)\n",
           "Requirement already satisfied: google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5 in /opt/conda/lib/python3.7/site-packages (from google-cloud-bigquery==2.34.4) (2.8.0)\n",
           "Requirement already satisfied: python-dateutil<3.0dev,>=2.7.2 in /opt/conda/lib/python3.7/site-packages (from google-cloud-bigquery==2.34.4) (2.8.2)\n",
           "Requirement already satisfied: packaging<22.0dev,>=14.3 in /opt/conda/lib/python3.7/site-packages (from google-cloud-bigquery==2.34.4) (21.3)\n",
           "Requirement already satisfied: google-auth<3.0dev,>=1.25.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-bigquery==2.34.4) (1.35.0)\n",
           "Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.52.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-bigquery==2.34.4) (1.56.4)\n",
           "Requirement already satisfied: grpcio-status<2.0dev,>=1.33.2 in /opt/conda/lib/python3.7/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-bigquery==2.34.4) (1.48.2)\n",
           "Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /opt/conda/lib/python3.7/site-packages (from google-resumable-media<3.0dev,>=0.6.0->google-cloud-bigquery==2.34.4) (1.1.2)\n",
           "Requirement already satisfied: six>=1.5.2 in /opt/conda/lib/python3.7/site-packages (from grpcio<2.0dev,>=1.38.1->google-cloud-bigquery==2.34.4) (1.15.0)\n",
           "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging<22.0dev,>=14.3->google-cloud-bigquery==2.34.4) (3.0.9)\n",
           "Requirement already satisfied: charset-normalizer<3,>=2 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-bigquery==2.34.4) (2.1.1)\n",
           "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-bigquery==2.34.4) (1.26.11)\n",
           "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-bigquery==2.34.4) (2022.9.24)\n",
           "Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests<3.0.0dev,>=2.18.0->google-cloud-bigquery==2.34.4) (3.4)\n",
           "Requirement already satisfied: setuptools>=40.3.0 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-bigquery==2.34.4) (59.8.0)\n",
           "Requirement already satisfied: rsa<5,>=3.1.4 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-bigquery==2.34.4) (4.9)\n",
           "Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-bigquery==2.34.4) (0.2.7)\n",
           "Requirement already satisfied: cachetools<5.0,>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from google-auth<3.0dev,>=1.25.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-bigquery==2.34.4) (4.2.4)\n",
           "Requirement already satisfied: cffi>=1.0.0 in /opt/conda/lib/python3.7/site-packages (from google-crc32c<2.0dev,>=1.0->google-resumable-media<3.0dev,>=0.6.0->google-cloud-bigquery==2.34.4) (1.15.1)\n",
           "Requirement already satisfied: pycparser in /opt/conda/lib/python3.7/site-packages (from cffi>=1.0.0->google-crc32c<2.0dev,>=1.0->google-resumable-media<3.0dev,>=0.6.0->google-cloud-bigquery==2.34.4) (2.21)\n",
           "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /opt/conda/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=1.25.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-bigquery==2.34.4) (0.4.8)\n"
         ]
        }
      ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Kindly ignore the deprecation warnings and incompatibility errors related to google-cloud-storage."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**Note**: Restart your kernel to use updated packages."
     ]
    },
    {
     "cell_type": "code",
     "execution_count": 1,
     "metadata": {
      "colab": {},
      "colab_type": "code",
      "id": "9dEreb4QKizj"
     },
     "outputs": [
      {
       "name": "stdout",
       "output_type": "stream",
       "text": [
        "TensorFlow version:  2.6.5\n"
       ]
      }
     ],
     "source": [
      "import tensorflow as tf\n",
      "\n",
      "print(\"TensorFlow version: \",tf.version.VERSION)"
     ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "F78xGb8xgKkd",
        "colab_type": "code",
        "colab": {},
        "outputId": "4e1b0020-4fe7-4594-824c-6246a0e7edae"
      },
      "source": [
        "%%bash\n",
        "\n",
        "export PROJECT=$(gcloud config list project --format \"value(core.project)\")\n",
        "echo \"Your current GCP Project Name is: \"$PROJECT"
      ],
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Your current GCP Project Name is: qwiklabs-gcp-04-5527e928f1d7\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "L0-vOB4y2BJM"
      },
      "source": [
        "## The source dataset\n",
        "\n",
        "Our dataset is hosted in [BigQuery](https://cloud.google.com/bigquery/). The taxi fare data is a publically available dataset, meaning anyone with a GCP account has access. Click [here](https://console.cloud.google.com/bigquery?project=bigquery-public-data&p=nyc-tlc&d=yellow&t=trips&page=table) to access the dataset.\n",
        "\n",
        "The Taxi Fare dataset is relatively large at 55 million training rows, but simple to understand, with only six features. The fare_amount is the target, the continuous value we’ll train a model to predict.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "X8HpUXTAgKkh",
        "colab_type": "text"
      },
      "source": [
        "## Create a BigQuery Dataset\n",
        "\n",
        "A BigQuery dataset is a container for tables, views, and models built with BigQuery ML. Let's create one called __feat_eng__ if we have not already done so in an earlier lab. We'll do the same for a GCS bucket for our project too."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "i_PC0J4BgKkh",
        "colab_type": "code",
        "colab": {},
        "outputId": "6b794a09-d144-405f-8178-25ada7825681"
      },
      "source": [
        "%%bash\n",
        "\n",
        "# Create a BigQuery dataset for feat_eng if it doesn't exist\n",
        "datasetexists=$(bq ls -d | grep -w feat_eng)\n",
        "\n",
        "if [ -n \"$datasetexists\" ]; then\n",
        "    echo -e \"BigQuery dataset already exists, let's not recreate it.\"\n",
        "\n",
        "else\n",
        "    echo \"Creating BigQuery dataset titled: feat_eng\"\n",
        "    \n",
        "    bq --location=US mk --dataset \\\n",
        "        --description 'Taxi Fare' \\\n",
        "        $PROJECT:feat_eng\n",
        "   echo \"\\nHere are your current datasets:\"\n",
        "   bq ls\n",
        "fi    "
      ],
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Creating BigQuery dataset titled: feat_eng\n",
            "Dataset 'munn-sandbox:feat_eng' successfully created.\n",
            "\\nHere are your current datasets:\n",
            "    datasetId    \n",
            " --------------- \n",
            "  None           \n",
            "  arc_results    \n",
            "  babyweight     \n",
            "  bqml_taxifare  \n",
            "  demo           \n",
            "  demo_sample    \n",
            "  feat_eng       \n",
            "  movielens      \n",
            "  patent_demo    \n",
            "  patents        \n",
            "  stock_market   \n",
            "  taxifare       \n",
            "  test_sample    \n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "b2TuS1s9vREL"
      },
      "source": [
        "## Create the training data table\n",
        "\n",
        "Since there is already a publicly available dataset, we can simply create the training data table using this raw input data.  Note the WHERE clause in the below query:  This clause allows us to TRAIN a portion of the data (e.g. one hundred thousand rows versus one million rows), which keeps your query costs down.  If you need a refresher on using MOD() for repeatable splits see this [post](https://www.oreilly.com/learning/repeatable-sampling-of-data-sets-in-bigquery-for-machine-learning).  \n",
        "\n",
        "* Note:  The dataset in the create table code below is the one created previously, e.g. \"feat_eng\".  The table name is \"feateng_training_data\".  Run the query to create the table."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "CMNRractvREL",
        "colab": {},
        "outputId": "d2a2084b-ec56-48a5-8522-409cc32ea75e"
      },
      "source": [
        "%%bigquery \n",
        "\n",
        "CREATE OR REPLACE TABLE\n",
        "  feat_eng.feateng_training_data AS\n",
        "SELECT\n",
        "  (tolls_amount + fare_amount) AS fare_amount,\n",
        "  passenger_count*1.0 AS passengers,\n",
        "  pickup_datetime,\n",
        "  pickup_longitude AS pickuplon,\n",
        "  pickup_latitude AS pickuplat,\n",
        "  dropoff_longitude AS dropofflon,\n",
        "  dropoff_latitude AS dropofflat\n",
        "FROM\n",
        "  `nyc-tlc.yellow.trips`\n",
        "WHERE\n",
        "  MOD(ABS(FARM_FINGERPRINT(CAST(pickup_datetime AS STRING))), 10000) = 1\n",
        "  AND fare_amount >= 2.5\n",
        "  AND passenger_count > 0\n",
        "  AND pickup_longitude > -78\n",
        "  AND pickup_longitude < -70\n",
        "  AND dropoff_longitude > -78\n",
        "  AND dropoff_longitude < -70\n",
        "  AND pickup_latitude > 37\n",
        "  AND pickup_latitude < 45\n",
        "  AND dropoff_latitude > 37\n",
        "  AND dropoff_latitude < 45"
      ],
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "Empty DataFrame\n",
              "Columns: []\n",
              "Index: []"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 4
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "clnaaqQsXkwC"
      },
      "source": [
        "## Verify table creation\n",
        "\n",
        "Verify that you created the dataset.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "gjNcSsaDgKkm",
        "colab_type": "code",
        "colab": {},
        "outputId": "d0ef8389-89e3-48b7-9576-5c3104370172"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "# LIMIT 0 is a free query; this allows us to check that the table exists.\n",
        "SELECT\n",
        "*\n",
        "FROM\n",
        "  feat_eng.feateng_training_data\n",
        "LIMIT\n",
        "  0"
      ],
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>fare_amount</th>\n",
              "      <th>passengers</th>\n",
              "      <th>pickup_datetime</th>\n",
              "      <th>pickuplon</th>\n",
              "      <th>pickuplat</th>\n",
              "      <th>dropofflon</th>\n",
              "      <th>dropofflat</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "Empty DataFrame\n",
              "Columns: [fare_amount, passengers, pickup_datetime, pickuplon, pickuplat, dropofflon, dropofflat]\n",
              "Index: []"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 5
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "RhgXan8wvREN"
      },
      "source": [
        "### Baseline Model:  Create the baseline model\n",
        "\n",
        "Next, you create a linear regression baseline model with no feature engineering.  Recall that a model in BigQuery ML represents what an ML system has learned from the training data.  A baseline model is a solution to a problem without applying any machine learning techniques.  \n",
        "\n",
        "When creating a BQML model, you must specify the model type (in our case linear regression) and the input label (fare_amount).  Note also that we are using the training data table as the data source."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "kb_5NlfU7oyT"
      },
      "source": [
        "Now we create the SQL statement to create the baseline model."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ixxd1ugPgKkr",
        "colab_type": "code",
        "colab": {},
        "outputId": "ae2bb206-aaa3-42ce-83ef-2f22e059a81c"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "CREATE OR REPLACE MODEL\n",
        "  feat_eng.baseline_model OPTIONS (model_type='linear_reg',\n",
        "    input_label_cols=['fare_amount']) AS\n",
        "SELECT\n",
        "  fare_amount,\n",
        "  passengers,\n",
        "  pickup_datetime,\n",
        "  pickuplon,\n",
        "  pickuplat,\n",
        "  dropofflon,\n",
        "  dropofflat\n",
        "FROM\n",
        "  feat_eng.feateng_training_data"
      ],
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "Empty DataFrame\n",
              "Columns: []\n",
              "Index: []"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 6
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Tq2KYJOM9ULC"
      },
      "source": [
        "\n",
        "Note, the query takes several minutes to complete. After the first iteration is complete, your model (baseline_model) appears in the navigation panel of the BigQuery web UI. Because the query uses a CREATE MODEL statement to create a model, you do not see query results.\n",
        "\n",
        "You can observe the model as it's being trained by viewing the Model stats tab in the BigQuery web UI. As soon as the first iteration completes, the tab is updated. The stats continue to update as each iteration completes."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "HO5d50Eic-X1"
      },
      "source": [
        "Once the training is done, visit the [BigQuery Cloud Console](https://console.cloud.google.com/bigquery) and look at the model that has been trained. Then, come back to this notebook."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "RSgIJqN6vREV"
      },
      "source": [
        "### Evaluate the baseline model\n",
        "Note that BigQuery automatically split the data we gave it, and trained on only a part of the data and used the rest for evaluation.  After creating your model, you evaluate the performance of the regressor using the ML.EVALUATE function. The ML.EVALUATE function evaluates the predicted values against the actual data.\n",
        "\n",
        "NOTE: The results are also displayed in the [BigQuery Cloud Console](https://console.cloud.google.com/bigquery) under the **Evaluation** tab."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ofrjof9ngKku",
        "colab_type": "text"
      },
      "source": [
        "Review the learning and eval statistics for the baseline_model."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "pKw8SkXjgKku",
        "colab_type": "code",
        "colab": {},
        "outputId": "5be55f39-f3df-4e7c-aa39-12a681bd8d32"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "# Eval statistics on the held out data.\n",
        "SELECT\n",
        "  *,\n",
        "  SQRT(loss) AS rmse\n",
        "FROM\n",
        "  ML.TRAINING_INFO(MODEL feat_eng.baseline_model)"
      ],
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>training_run</th>\n",
              "      <th>iteration</th>\n",
              "      <th>loss</th>\n",
              "      <th>eval_loss</th>\n",
              "      <th>learning_rate</th>\n",
              "      <th>duration_ms</th>\n",
              "      <th>rmse</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>74.43591</td>\n",
              "      <td>68.880408</td>\n",
              "      <td>None</td>\n",
              "      <td>12426</td>\n",
              "      <td>8.627625</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   training_run  iteration      loss  eval_loss learning_rate  duration_ms  \\\n",
              "0             0          0  74.43591  68.880408          None        12426   \n",
              "\n",
              "       rmse  \n",
              "0  8.627625  "
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 7
        }
      ]
    },
    {
 "cell_type": "markdown",
 "metadata": {},
 "source": [
  "**Lab Task #1:** Evaluate the model"
 ]
  },
    {
      "cell_type": "code",
      "metadata": {
        "id": "v0-yRVp2gKkw",
        "colab_type": "code",
        "colab": {},
        "outputId": "72a3067c-5ce1-43c4-d44f-506837cf03f7"
      },
      "source": [
        "# TODO 1\n",
        "# TODO -- Your code here.\n"
      ],
      "execution_count": 8,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>mean_absolute_error</th>\n",
              "      <th>mean_squared_error</th>\n",
              "      <th>mean_squared_log_error</th>\n",
              "      <th>median_absolute_error</th>\n",
              "      <th>r2_score</th>\n",
              "      <th>explained_variance</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>5.213479</td>\n",
              "      <td>68.880408</td>\n",
              "      <td>0.258098</td>\n",
              "      <td>3.794535</td>\n",
              "      <td>0.226065</td>\n",
              "      <td>0.22613</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   mean_absolute_error  mean_squared_error  mean_squared_log_error  \\\n",
              "0             5.213479           68.880408                0.258098   \n",
              "\n",
              "   median_absolute_error  r2_score  explained_variance  \n",
              "0               3.794535  0.226065             0.22613  "
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 8
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "xJGbfYuD8a9d"
      },
      "source": [
        "**NOTE:** Because you performed a linear regression, the results include the following columns:\n",
        "\n",
        "*   mean_absolute_error\n",
        "*   mean_squared_error\n",
        "*   mean_squared_log_error\n",
        "*   median_absolute_error\n",
        "*   r2_score\n",
        "*   explained_variance\n",
        "\n",
        "**Resource** for an explanation of the [Regression Metrics](https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234).\n",
        "\n",
        "**Mean squared error** (MSE) - Measures the difference between the values our model predicted using the test set and the actual values. You can also think of it as the distance between your regression (best fit) line and the predicted values. \n",
        "\n",
        "**Root mean squared error** (RMSE) - The primary evaluation metric for this ML problem is the root mean-squared error. RMSE measures the difference between the predictions of a model, and the observed values. A large RMSE is equivalent to a large average error, so smaller values of RMSE are better. One nice property of RMSE is that the error is given in the units being measured, so you can tell very directly how incorrect the model might be on unseen data.\n",
        "\n",
        "**R2**:  An important metric in the evaluation results is the R2 score. The R2 score is a statistical measure that determines if the linear regression predictions approximate the actual data. Zero (0) indicates that the model explains none of the variability of the response data around the mean.  One (1) indicates that the model explains all the variability of the response data around the mean."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "p_21sAIR7LZw"
      },
      "source": [
        "Next, we write a SQL query to take the SQRT() of the mean squared error as your loss metric for evaluation for the benchmark_model."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "form",
        "colab_type": "code",
        "id": "8mAXRTvbvRES",
        "colab": {},
        "outputId": "65784ff8-522b-4c43-f7c6-379fe7c901a8"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  SQRT(mean_squared_error) AS rmse\n",
        "FROM\n",
        "  ML.EVALUATE(MODEL feat_eng.baseline_model)"
      ],
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>rmse</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>8.299422</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "       rmse\n",
              "0  8.299422"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 9
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "nW6fnqAW8vyI"
      },
      "source": [
        "#### Model 1:  EXTRACT dayofweek from the pickup_datetime feature.\n",
        "\n",
        "* As you recall, dayofweek is an enum representing the 7 days of the week. This factory allows the enum to be obtained from the int value. The int value follows the ISO-8601 standard, from 1 (Monday) to 7 (Sunday).\n",
        "\n",
        "* If you were to extract the dayofweek from pickup_datetime using BigQuery SQL, the datatype returned would be integer."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tStXdJYhgKk1",
        "colab_type": "text"
      },
      "source": [
        "**Lab Task #2:** We will create a model titled \"model_1\" from the benchmark model and extract out the DayofWeek."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "both",
        "colab_type": "code",
        "id": "ZQ0kT2jN-vpm",
        "colab": {},
        "outputId": "599e3b87-c772-4474-c959-2fe80de33c64"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "CREATE OR REPLACE MODEL\n",
        "  feat_eng.model_1 OPTIONS (model_type='linear_reg',\n",
        "    input_label_cols=['fare_amount']) AS\n",
        "SELECT\n",
        "  fare_amount,\n",
        "  passengers,\n",
        "  pickup_datetime,\n",
        "# TODO 2\n",
        "# TODO -- Your code here.\n"
      ],
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "Empty DataFrame\n",
              "Columns: []\n",
              "Index: []"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 10
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "T24XjIJgdLCH"
      },
      "source": [
        "Once the training is done, visit the [BigQuery Cloud Console](https://console.cloud.google.com/bigquery) and look at the model that has been trained. Then, come back to this notebook."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "SRLxpccX_Tin"
      },
      "source": [
        "Next, two distinct SQL statements show the TRAINING and EVALUATION metrics of model_1."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "XfN4oxOigKk4",
        "colab_type": "code",
        "colab": {},
        "outputId": "300723d5-18d8-44f4-a600-ff2ffd91eaeb"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  *,\n",
        "  SQRT(loss) AS rmse\n",
        "FROM\n",
        "  ML.TRAINING_INFO(MODEL feat_eng.model_1)"
      ],
      "execution_count": 11,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>training_run</th>\n",
              "      <th>iteration</th>\n",
              "      <th>loss</th>\n",
              "      <th>eval_loss</th>\n",
              "      <th>learning_rate</th>\n",
              "      <th>duration_ms</th>\n",
              "      <th>rmse</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>72.440724</td>\n",
              "      <td>88.953232</td>\n",
              "      <td>None</td>\n",
              "      <td>17251</td>\n",
              "      <td>8.511212</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   training_run  iteration       loss  eval_loss learning_rate  duration_ms  \\\n",
              "0             0          0  72.440724  88.953232          None        17251   \n",
              "\n",
              "       rmse  \n",
              "0  8.511212  "
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 11
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "jvke4qzIgKk8",
        "colab_type": "code",
        "colab": {},
        "outputId": "30a07042-034e-4bbb-fc02-e87306a13a0d"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  *\n",
        "FROM\n",
        "  ML.EVALUATE(MODEL feat_eng.model_1)"
      ],
      "execution_count": 12,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>mean_absolute_error</th>\n",
              "      <th>mean_squared_error</th>\n",
              "      <th>mean_squared_log_error</th>\n",
              "      <th>median_absolute_error</th>\n",
              "      <th>r2_score</th>\n",
              "      <th>explained_variance</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <td>0</td>\n",
              "      <td>5.287076</td>\n",
              "      <td>88.953232</td>\n",
              "      <td>0.260932</td>\n",
              "      <td>3.713439</td>\n",
              "      <td>0.08481</td>\n",
              "      <td>0.084811</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   mean_absolute_error  mean_squared_error  mean_squared_log_error  \\\n",
              "0             5.287076           88.953232                0.260932   \n",
              "\n",
              "   median_absolute_error  r2_score  explained_variance  \n",
              "0               3.713439   0.08481            0.084811  "
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 12
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "wf-9FBmL_Ti_"
      },
      "source": [
        "Here we run a SQL query to take the SQRT() of the mean squared error as your loss metric for evaluation for the benchmark_model."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "both",
        "colab_type": "code",
        "id": "CsVBzNef_TjC",
        "colab": {},
        "outputId": "74f91e84-494f-4699-fa99-c0b165fae2b2"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  SQRT(mean_squared_error) AS rmse\n",
        "FROM\n",
        "  ML.EVALUATE(MODEL feat_eng.model_1)"
      ],
      "execution_count": 13,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>rmse</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <td>0</td>\n",
              "      <td>9.431502</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "       rmse\n",
              "0  9.431502"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 13
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Lw30UexH8v9P"
      },
      "source": [
        "### Model 2:  EXTRACT hourofday from the pickup_datetime feature\n",
        "\n",
        "As you recall, **pickup_datetime** is stored as a TIMESTAMP, where the Timestamp format is retrieved in the standard output format – year-month-day hour:minute:second (e.g. 2016-01-01 23:59:59).  Hourofday returns the integer number representing the hour number of the given date.\n",
        "\n",
        "Hourofday is best thought of as a discrete ordinal variable (and not a categorical feature), as the hours can be ranked (e.g. there is a natural ordering of the values).  Hourofday has an added characteristic of being cyclic, since 12am follows 11pm and precedes 1am."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zXeGHeWogKlC",
        "colab_type": "text"
      },
      "source": [
        "Next, we create a model titled \"model_2\" and EXTRACT the hourofday from the pickup_datetime feature to improve our model's rmse."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "both",
        "colab_type": "code",
        "id": "FHeqcYz-B9F1",
        "colab": {},
        "outputId": "f682ba71-66d8-4550-a2e8-3260dc04354b"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "CREATE OR REPLACE MODEL\n",
        "  feat_eng.model_2 OPTIONS (model_type='linear_reg',\n",
        "    input_label_cols=['fare_amount']) AS\n",
        "SELECT\n",
        "  fare_amount,\n",
        "  passengers,\n",
        "  #pickup_datetime,\n",
        "  EXTRACT(DAYOFWEEK\n",
        "  FROM\n",
        "    pickup_datetime) AS dayofweek,\n",
        "  EXTRACT(HOUR\n",
        "  FROM\n",
        "    pickup_datetime) AS hourofday,\n",
        "  pickuplon,\n",
        "  pickuplat,\n",
        "  dropofflon,\n",
        "  dropofflat\n",
        "FROM\n",
        "  `feat_eng.feateng_training_data`"
      ],
      "execution_count": 14,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "Empty DataFrame\n",
              "Columns: []\n",
              "Index: []"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 14
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "both",
        "colab_type": "code",
        "id": "h2yjF6uGCiZh",
        "colab": {},
        "outputId": "a53802ff-821a-427b-e358-1fa4e916576a"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  *\n",
        "FROM\n",
        "  ML.EVALUATE(MODEL feat_eng.model_2)"
      ],
      "execution_count": 15,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>mean_absolute_error</th>\n",
              "      <th>mean_squared_error</th>\n",
              "      <th>mean_squared_log_error</th>\n",
              "      <th>median_absolute_error</th>\n",
              "      <th>r2_score</th>\n",
              "      <th>explained_variance</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>5.256421</td>\n",
              "      <td>70.709518</td>\n",
              "      <td>0.262399</td>\n",
              "      <td>3.895416</td>\n",
              "      <td>0.236668</td>\n",
              "      <td>0.236768</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   mean_absolute_error  mean_squared_error  mean_squared_log_error  \\\n",
              "0             5.256421           70.709518                0.262399   \n",
              "\n",
              "   median_absolute_error  r2_score  explained_variance  \n",
              "0               3.895416  0.236668            0.236768  "
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 15
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "both",
        "colab_type": "code",
        "id": "bhfabG8XCiZm",
        "colab": {},
        "outputId": "75c94b12-7d46-4324-8aa7-5774aa15544d"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  SQRT(mean_squared_error) AS rmse\n",
        "FROM\n",
        "  ML.EVALUATE(MODEL feat_eng.model_2)"
      ],
      "execution_count": 16,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>rmse</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>8.408895</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "       rmse\n",
              "0  8.408895"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 16
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "vbOSxv6BDqB-"
      },
      "source": [
        "### Model 3:  Feature cross dayofweek and hourofday using CONCAT\n",
        "\n",
        "First, let’s allow the model to learn traffic patterns by creating a new feature that combines the time of day and day of week (this is called a [feature cross](https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture). \n",
        "\n",
        "Note:  BQML by default assumes that numbers are numeric features, and strings are categorical features.  We need to convert both the dayofweek and hourofday features to strings because the model (Neural Network) will automatically treat any integer as a numerical value rather than a categorical value.  Thus, if not cast as a string, the dayofweek feature will be interpreted as numeric values (e.g. 1,2,3,4,5,6,7) and hour ofday will also be interpreted as numeric values (e.g. the day begins at midnight, 00:00, and the last minute of the day begins at 23:59 and ends at 24:00).  As such, there is no way to distinguish the \"feature cross\" of hourofday and dayofweek \"numerically\".  Casting the dayofweek and hourofday as strings ensures that each element will be treated like a label and will get its own coefficient associated with it."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cNVO4nr6gKlK",
        "colab_type": "text"
      },
      "source": [
        "**Lab Task #3:** Create the SQL statement to feature cross the dayofweek and hourofday using the CONCAT function.  Name the model \"model_3\""
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "both",
        "colab_type": "code",
        "id": "F7l02C9KFMy7",
        "colab": {},
        "outputId": "9d43f216-c138-4c0d-cf7b-4d938ba33d78"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "CREATE OR REPLACE MODEL\n",
        "  feat_eng.model_3 OPTIONS (model_type='linear_reg',\n",
        "    input_label_cols=['fare_amount']) AS\n",
        "# TODO 3\n",
        "# TODO -- Your code here.\n"
      ],
      "execution_count": 17,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "Empty DataFrame\n",
              "Columns: []\n",
              "Index: []"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 17
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Glzorg3YgKlM",
        "colab_type": "code",
        "colab": {},
        "outputId": "3bd57903-74da-46d7-ff26-4348b302053e"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  *\n",
        "FROM\n",
        "  ML.EVALUATE(MODEL feat_eng.model_3)"
      ],
      "execution_count": 18,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>mean_absolute_error</th>\n",
              "      <th>mean_squared_error</th>\n",
              "      <th>mean_squared_log_error</th>\n",
              "      <th>median_absolute_error</th>\n",
              "      <th>r2_score</th>\n",
              "      <th>explained_variance</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>5.265321</td>\n",
              "      <td>69.356068</td>\n",
              "      <td>0.267477</td>\n",
              "      <td>3.887588</td>\n",
              "      <td>0.220371</td>\n",
              "      <td>0.220574</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   mean_absolute_error  mean_squared_error  mean_squared_log_error  \\\n",
              "0             5.265321           69.356068                0.267477   \n",
              "\n",
              "   median_absolute_error  r2_score  explained_variance  \n",
              "0               3.887588  0.220371            0.220574  "
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 18
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "6GtPp2QugKlO",
        "colab_type": "code",
        "colab": {},
        "outputId": "fc5c0942-4da7-4ab7-9fcd-fd4b6ce82917"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  SQRT(mean_squared_error) AS rmse\n",
        "FROM\n",
        "  ML.EVALUATE(MODEL feat_eng.model_3)"
      ],
      "execution_count": 19,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>rmse</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>8.328029</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "       rmse\n",
              "0  8.328029"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 19
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "FbSRbuJ-fYtK"
      },
      "source": [
        "### Model 4:  Apply the ML.FEATURE_CROSS clause to categorical features\n",
        "\n",
        "BigQuery ML now has ML.FEATURE_CROSS, a pre-processing clause that performs a feature cross.  \n",
        "\n",
        "* ML.FEATURE_CROSS generates a [STRUCT](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#struct-type) feature with all combinations of crossed categorical features, except for 1-degree items (the original features) and self-crossing items.  \n",
        "\n",
        "* Syntax:  ML.FEATURE_CROSS(STRUCT(features), degree)\n",
        "\n",
        "* The feature parameter is a categorical features separated by comma to be crossed. The maximum number of input features is 10. An unnamed feature is not allowed in features. Duplicates are not allowed in features.\n",
        "\n",
        "* Degree(optional): The highest degree of all combinations. Degree should be in the range of [1, 4]. Default to 2.\n",
        "\n",
        "Output: The function outputs a STRUCT of all combinations except for 1-degree items (the original features) and self-crossing items, with field names as concatenation of original feature names and values as the concatenation of the column string values.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "1zZI0X6s1ysl"
      },
      "source": [
        "Examine the components of ML.Feature_Cross"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "hfKmfGqw1ysq",
        "colab": {},
        "outputId": "adb978c5-90cb-4ee2-eaad-ad8060276d16"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "CREATE OR REPLACE MODEL feat_eng.model_4\n",
        "OPTIONS\n",
        "  (model_type='linear_reg',\n",
        "    input_label_cols=['fare_amount'])  \n",
        "AS\n",
        "SELECT\n",
        "  fare_amount,\n",
        "  passengers,\n",
        "  #pickup_datetime,\n",
        "  #EXTRACT(DAYOFWEEK FROM pickup_datetime) AS dayofweek,\n",
        "  #EXTRACT(HOUR FROM pickup_datetime) AS hourofday,\n",
        "  #CONCAT(CAST(EXTRACT(DAYOFWEEK FROM pickup_datetime) AS STRING), \n",
        "        #CAST(EXTRACT(HOUR FROM pickup_datetime) AS STRING)) AS hourofday,\n",
        " ML.FEATURE_CROSS(STRUCT(CAST(EXTRACT(DAYOFWEEK FROM pickup_datetime) AS STRING) AS dayofweek,\n",
        "  CAST(EXTRACT(HOUR FROM pickup_datetime) AS STRING) AS hourofday)) AS day_hr,\n",
        "  pickuplon,\n",
        "  pickuplat,\n",
        "  dropofflon,\n",
        "  dropofflat\n",
        "  \n",
        "FROM `feat_eng.feateng_training_data`"
      ],
      "execution_count": 20,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "Empty DataFrame\n",
              "Columns: []\n",
              "Index: []"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 20
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "G6tpoYhcIgs4"
      },
      "source": [
        "Next, two distinct SQL statements show the TRAINING and EVALUATION metrics of model_1."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "NZudI67BgKlU",
        "colab_type": "code",
        "colab": {},
        "outputId": "2dc86a64-8879-4ee3-d1ce-55f3c698371b"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  *\n",
        "FROM\n",
        "  ML.EVALUATE(MODEL feat_eng.model_4)"
      ],
      "execution_count": 21,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>mean_absolute_error</th>\n",
              "      <th>mean_squared_error</th>\n",
              "      <th>mean_squared_log_error</th>\n",
              "      <th>median_absolute_error</th>\n",
              "      <th>r2_score</th>\n",
              "      <th>explained_variance</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>5.500029</td>\n",
              "      <td>93.260577</td>\n",
              "      <td>0.27445</td>\n",
              "      <td>3.808678</td>\n",
              "      <td>0.080263</td>\n",
              "      <td>0.080492</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   mean_absolute_error  mean_squared_error  mean_squared_log_error  \\\n",
              "0             5.500029           93.260577                 0.27445   \n",
              "\n",
              "   median_absolute_error  r2_score  explained_variance  \n",
              "0               3.808678  0.080263            0.080492  "
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 21
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "AZPAQozDgKlW",
        "colab_type": "code",
        "colab": {},
        "outputId": "5f7f8fe4-f3a9-4ca1-ed9f-af5b7bfabdb6"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  SQRT(mean_squared_error) AS rmse\n",
        "FROM\n",
        "  ML.EVALUATE(MODEL feat_eng.model_4)"
      ],
      "execution_count": 22,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>rmse</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>9.657152</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "       rmse\n",
              "0  9.657152"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 22
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "ymzRp28Q1ys1"
      },
      "source": [
        "### Sliding down the slope toward a loss minimum (reduced taxi fare)!\n",
        "* Our fourth model above gives us an RMSE of 9.65 for estimating fares. Recall our heuristic benchmark was 8.29. This may be the result of feature crossing.  Let's apply more feature engineering techniques to see if we can't get this loss metric lower!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "2NAPkAlEEg6C"
      },
      "source": [
        "### Model 5:  Feature cross coordinate features to create a Euclidean feature\n",
        "\n",
        "\n",
        "Pickup coordinate:\n",
        "*  pickup_longitude AS pickuplon\n",
        "*  pickup_latitude AS pickuplat\n",
        "\n",
        "Dropoff coordinate:\n",
        "*   #dropoff_longitude AS dropofflon\n",
        "*   #dropoff_latitude AS dropofflat\n",
        "\n",
        "**Coordinate Features**:\n",
        "* The pick-up and drop-off longitude and latitude data are crucial to predicting the fare amount as fare amounts in NYC taxis are largely determined by the distance traveled.  As such, we need to  teach the model the Euclidean distance between the pick-up and drop-off points.  \n",
        "\n",
        "* Recall that latitude and longitude allows us to specify any location on Earth using a set of coordinates.  In our training data set, we restricted our data points to only pickups and drop offs within NYC. New York city has an approximate longitude range of -74.05 to -73.75 and a latitude range of 40.63 to 40.85.\n",
        "\n",
        "* The dataset contains information regarding the pickup and drop off coordinates. However, there is no information regarding the distance between the pickup and drop off points. Therefore, we create a new feature that calculates the distance between each pair of pickup and drop off points. We can do this using the Euclidean Distance, which is the straight-line distance between any two coordinate points.\n",
        "\n",
        "* We need to convert those coordinates into a single column of a spatial data type.  We will use the ST_DISTANCE and the ST_GEOGPOINT functions.  \n",
        "\n",
        "* ST_DISTANCE:  ST_DISTANCE(geography_1, geography_2).  Returns the shortest distance in meters between two non-empty GEOGRAPHYs (e.g. between two spatial objects).\n",
        "\n",
        "* ST_GEOGPOINT:  ST_GEOGPOINT(longitude, latitude).  Creates a GEOGRAPHY with a single point. ST_GEOGPOINT creates a point from the specified FLOAT64 longitude and latitude parameters and returns that point in a GEOGRAPHY value.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "kCYniJnejNz0"
      },
      "source": [
        "Next we convert the feature coordinates into a single column of a spatial data type. Use the The ST_Distance and the ST.GeogPoint functions.\n",
        "\n",
        "SAMPLE CODE:\n",
        "ST_Distance(ST_GeogPoint(value1,value2), ST_GeogPoint(value3, value4)) AS euclidean\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "**Lab Task #4:** Create a Euclidean feature"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "both",
        "colab_type": "code",
        "id": "P8mFocaKj9oA",
        "colab": {},
        "outputId": "77cb597a-b130-4b8a-a253-a2d72dc7da2e"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "CREATE OR REPLACE MODEL\n",
        "  feat_eng.model_5 OPTIONS (model_type='linear_reg',\n",
        "    input_label_cols=['fare_amount']) AS\n",
        "SELECT\n",
        "  fare_amount,\n",
        "  passengers,\n",
        "  #pickup_datetime,\n",
        "  #EXTRACT(DAYOFWEEK FROM pickup_datetime) AS dayofweek,\n",
        "  #EXTRACT(HOUR FROM pickup_datetime) AS hourofday,\n",
        "  #CONCAT(CAST(EXTRACT(DAYOFWEEK FROM pickup_datetime) AS STRING),\n",
        "    #CAST(EXTRACT(HOUR FROM pickup_datetime) AS STRING)) AS hourofday,\n",
        "  ML.FEATURE_CROSS(STRUCT(CAST(EXTRACT(DAYOFWEEK\n",
        "        FROM\n",
        "          pickup_datetime) AS STRING) AS dayofweek,\n",
        "      CAST(EXTRACT(HOUR\n",
        "        FROM\n",
        "          pickup_datetime) AS STRING) AS hourofday)) AS day_hr,\n",
        "  #pickuplon,\n",
        "  #pickuplat,\n",
        "  #dropofflon,\n",
        "  #dropofflat,\n",
        "# TODO 4\n",
        "# TODO -- Your code here.\n"
      ],
      "execution_count": 23,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "Empty DataFrame\n",
              "Columns: []\n",
              "Index: []"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 23
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "uy7eK6iXlYPu"
      },
      "source": [
        "Next, two distinct SQL statements show metrics for model_5."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "both",
        "colab_type": "code",
        "id": "9atctYyGlYP7",
        "colab": {},
        "outputId": "aa2e5513-4df8-4c0c-d08b-63690b567ecb"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  *\n",
        "FROM\n",
        "  ML.EVALUATE(MODEL feat_eng.model_5)"
      ],
      "execution_count": 24,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>mean_absolute_error</th>\n",
              "      <th>mean_squared_error</th>\n",
              "      <th>mean_squared_log_error</th>\n",
              "      <th>median_absolute_error</th>\n",
              "      <th>r2_score</th>\n",
              "      <th>explained_variance</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>3.121311</td>\n",
              "      <td>31.229718</td>\n",
              "      <td>0.106923</td>\n",
              "      <td>2.21774</td>\n",
              "      <td>0.661492</td>\n",
              "      <td>0.661507</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   mean_absolute_error  mean_squared_error  mean_squared_log_error  \\\n",
              "0             3.121311           31.229718                0.106923   \n",
              "\n",
              "   median_absolute_error  r2_score  explained_variance  \n",
              "0                2.21774  0.661492            0.661507  "
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 24
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "both",
        "colab_type": "code",
        "id": "Lk42mvjzlYQE",
        "colab": {},
        "outputId": "e77d8fcc-ef55-40e5-aac1-5db287e0c4ea"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  SQRT(mean_squared_error) AS rmse\n",
        "FROM\n",
        "  ML.EVALUATE(MODEL feat_eng.model_5)"
      ],
      "execution_count": 25,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>rmse</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>5.588356</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "       rmse\n",
              "0  5.588356"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 25
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "uUoQmADkmlnV"
      },
      "source": [
        "### Model 6:  Feature cross  pick-up and drop-off locations features\n",
        "\n",
        "In this section, we feature cross the pick-up and drop-off locations so that the model can learn pick-up-drop-off pairs that will require tolls.\n",
        "\n",
        "This step takes the geographic point corresponding to the pickup point and grids to a 0.1-degree-latitude/longitude grid (approximately 8km x 11km in New York—we should experiment with finer resolution grids as well). Then, it concatenates the pickup and dropoff grid points to learn “corrections” beyond the Euclidean distance associated with pairs of pickup and dropoff locations.\n",
        "\n",
        "Because the lat and lon by themselves don't have meaning, but only in conjunction, it may be useful to treat the fields as a pair instead of just using them as numeric values. However, lat and lon are continuous numbers, so we have to discretize them first. That's what SnapToGrid does. \n",
        "\n",
        "\n",
        "* ST_SNAPTOGRID:  ST_SNAPTOGRID(geography_expression, grid_size).  Returns the input GEOGRAPHY, where each vertex has been snapped to a longitude/latitude grid. The grid size is determined by the grid_size parameter which is given in degrees.\n",
        "\n",
        "**REMINDER**: The ST_GEOGPOINT creates a GEOGRAPHY with a single point. ST_GEOGPOINT creates a point from the specified FLOAT64 longitude and latitude parameters and returns that point in a GEOGRAPHY value.  The ST_Distance function returns the minimum distance between two spatial objects.  It also returns meters for geographies and SRID units for geometrics.  "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "B9viHV3l1ytF"
      },
      "source": [
        "The following SQL statement is incorrect.  Modify the code to feature cross the pick-up and drop-off locations features. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "both",
        "colab_type": "code",
        "id": "7VjZawfZpQ7Y",
        "colab": {},
        "outputId": "10772712-3259-4f42-da8a-921a33c1e2bd"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "CREATE OR REPLACE MODEL\n",
        "  feat_eng.model_6 OPTIONS (model_type='linear_reg',\n",
        "    input_label_cols=['fare_amount']) AS\n",
        "SELECT\n",
        "  fare_amount,\n",
        "  passengers,\n",
        "  #pickup_datetime,\n",
        "  #EXTRACT(DAYOFWEEK FROM pickup_datetime) AS dayofweek,\n",
        "  #EXTRACT(HOUR FROM pickup_datetime) AS hourofday,\n",
        "  ML.FEATURE_CROSS(STRUCT(CAST(EXTRACT(DAYOFWEEK\n",
        "        FROM\n",
        "          pickup_datetime) AS STRING) AS dayofweek,\n",
        "      CAST(EXTRACT(HOUR\n",
        "        FROM\n",
        "          pickup_datetime) AS STRING) AS hourofday)) AS day_hr,\n",
        "  #pickuplon,\n",
        "  #pickuplat,\n",
        "  #dropofflon,\n",
        "  #dropofflat,\n",
        "  ST_Distance(ST_GeogPoint(pickuplon,\n",
        "      pickuplat),\n",
        "    ST_GeogPoint(dropofflon,\n",
        "      dropofflat)) AS euclidean,\n",
        "  CONCAT(ST_AsText(ST_SnapToGrid(ST_GeogPoint(pickuplon,\n",
        "          pickuplat),\n",
        "        0.01)), ST_AsText(ST_SnapToGrid(ST_GeogPoint(dropofflon,\n",
        "          dropofflat),\n",
        "        0.01))) AS pickup_and_dropoff\n",
        "FROM\n",
        "  `feat_eng.feateng_training_data`"
      ],
      "execution_count": 26,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "Empty DataFrame\n",
              "Columns: []\n",
              "Index: []"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 26
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "zvOfj_k1qijv"
      },
      "source": [
        "Next, we evaluate model_6."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "both",
        "colab_type": "code",
        "id": "G8rM09jLqij4",
        "colab": {},
        "outputId": "b24aa617-c46c-4cb9-f943-2e5a2d920945"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  *\n",
        "FROM\n",
        "  ML.EVALUATE(MODEL feat_eng.model_6)"
      ],
      "execution_count": 27,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>mean_absolute_error</th>\n",
              "      <th>mean_squared_error</th>\n",
              "      <th>mean_squared_log_error</th>\n",
              "      <th>median_absolute_error</th>\n",
              "      <th>r2_score</th>\n",
              "      <th>explained_variance</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>2.678752</td>\n",
              "      <td>34.885692</td>\n",
              "      <td>0.088598</td>\n",
              "      <td>1.485065</td>\n",
              "      <td>0.640979</td>\n",
              "      <td>0.642637</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   mean_absolute_error  mean_squared_error  mean_squared_log_error  \\\n",
              "0             2.678752           34.885692                0.088598   \n",
              "\n",
              "   median_absolute_error  r2_score  explained_variance  \n",
              "0               1.485065  0.640979            0.642637  "
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 27
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "both",
        "colab_type": "code",
        "id": "CQUfOjPlqij8",
        "colab": {},
        "outputId": "b72347ac-3de0-49d4-b385-2955edf195ab"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  SQRT(mean_squared_error) AS rmse\n",
        "FROM\n",
        "  ML.EVALUATE(MODEL feat_eng.model_6)"
      ],
      "execution_count": 28,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>rmse</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>5.906411</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "       rmse\n",
              "0  5.906411"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 28
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "I1dW3JLHrP5Z"
      },
      "source": [
        "### Code Clean Up"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Y4RL3f9K1ytN"
      },
      "source": [
        "#### Exercise:  Clean up the code to see where we are\n",
        "\n",
        "Remove all the commented statements in the SQL statement.  We should now have a total of five input features for our model.  \n",
        "1. fare_amount\n",
        "2. passengers\n",
        "3. day_hr\n",
        "4. euclidean\n",
        "5. pickup_and_dropoff"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "-UxZXY18rWG8",
        "colab": {},
        "outputId": "1901b6e0-78ea-4dbb-80a7-ae2da365b509"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "CREATE OR REPLACE MODEL\n",
        "  feat_eng.model_6 OPTIONS (model_type='linear_reg',\n",
        "    input_label_cols=['fare_amount']) AS\n",
        "SELECT\n",
        "  fare_amount,\n",
        "  passengers,\n",
        "  ML.FEATURE_CROSS(STRUCT(CAST(EXTRACT(DAYOFWEEK\n",
        "        FROM\n",
        "          pickup_datetime) AS STRING) AS dayofweek,\n",
        "      CAST(EXTRACT(HOUR\n",
        "        FROM\n",
        "          pickup_datetime) AS STRING) AS hourofday)) AS day_hr,\n",
        "  ST_Distance(ST_GeogPoint(pickuplon,\n",
        "      pickuplat),\n",
        "    ST_GeogPoint(dropofflon,\n",
        "      dropofflat)) AS euclidean,\n",
        "  CONCAT(ST_AsText(ST_SnapToGrid(ST_GeogPoint(pickuplon,\n",
        "          pickuplat),\n",
        "        0.01)), ST_AsText(ST_SnapToGrid(ST_GeogPoint(dropofflon,\n",
        "          dropofflat),\n",
        "        0.01))) AS pickup_and_dropoff\n",
        "FROM\n",
        "  `feat_eng.feateng_training_data`"
      ],
      "execution_count": 29,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "Empty DataFrame\n",
              "Columns: []\n",
              "Index: []"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 29
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "vSWHC_9Z1ytS"
      },
      "source": [
        "## BQML's Pre-processing functions:\n",
        "\n",
        "Here are some of the preprocessing functions in BigQuery ML:\n",
        "* ML.FEATURE_CROSS(STRUCT(features))    does a feature cross of all the combinations\n",
        "* ML.POLYNOMIAL_EXPAND(STRUCT(features), degree)    creates x, x<sup>2</sup>, x<sup>3</sup>, etc.\n",
        "* ML.BUCKETIZE(f, split_points)   where split_points is an array "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "ENp6mUvB1ytT"
      },
      "source": [
        "### Model 7:  Apply the BUCKETIZE Function \n",
        "\n",
        "\n",
        "##### BUCKETIZE \n",
        "Bucketize is a pre-processing function that creates \"buckets\" (e.g bins) - e.g. it bucketizes a continuous numerical feature into a string feature with bucket names as the value.\n",
        "\n",
        "* ML.BUCKETIZE(feature, split_points)\n",
        "\n",
        "* feature: A numerical column.\n",
        "\n",
        "* split_points: Array of numerical points to split the continuous values in feature into buckets. With n split points (s1, s2 … sn), there will be n+1 buckets generated. \n",
        "\n",
        "* Output: The function outputs a STRING for each row, which is the bucket name. bucket_name is in the format of bin_<bucket_number>, where bucket_number starts from 1.\n",
        "\n",
        "* Currently, our model uses the ST_GeogPoint function to derive the pickup and dropoff feature.  In this lab, we use the BUCKETIZE function to create the pickup and dropoff feature."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "B-VzlYCG1ytU"
      },
      "source": [
        "Next, apply the BUCKETIZE function to model_7 and run the query."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "bE3ZRGV21yta",
        "colab": {},
        "outputId": "83952ef3-62a2-4719-e647-5e0285424082"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "CREATE OR REPLACE MODEL\n",
        "  feat_eng.model_7 OPTIONS (model_type='linear_reg',\n",
        "    input_label_cols=['fare_amount']) AS\n",
        "SELECT\n",
        "  fare_amount,\n",
        "  passengers,\n",
        "  ST_Distance(ST_GeogPoint(pickuplon,\n",
        "      pickuplat),\n",
        "    ST_GeogPoint(dropofflon,\n",
        "      dropofflat)) AS euclidean,\n",
        "  ML.FEATURE_CROSS(STRUCT(CAST(EXTRACT(DAYOFWEEK\n",
        "        FROM\n",
        "          pickup_datetime) AS STRING) AS dayofweek,\n",
        "      CAST(EXTRACT(HOUR\n",
        "        FROM\n",
        "          pickup_datetime) AS STRING) AS hourofday)) AS day_hr,\n",
        "  CONCAT( ML.BUCKETIZE(pickuplon,\n",
        "      GENERATE_ARRAY(-78, -70, 0.01)), ML.BUCKETIZE(pickuplat,\n",
        "      GENERATE_ARRAY(37, 45, 0.01)), ML.BUCKETIZE(dropofflon,\n",
        "      GENERATE_ARRAY(-78, -70, 0.01)), ML.BUCKETIZE(dropofflat,\n",
        "      GENERATE_ARRAY(37, 45, 0.01)) ) AS pickup_and_dropoff\n",
        "FROM\n",
        "  `feat_eng.feateng_training_data`"
      ],
      "execution_count": 30,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "Empty DataFrame\n",
              "Columns: []\n",
              "Index: []"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 30
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "AVPXGKZ374v7"
      },
      "source": [
        "Next, we evaluate model_7."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "yRZc6S101ytc",
        "colab": {},
        "outputId": "237ba0f9-c2f7-4351-b2ce-879b9b301c3c"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  *,\n",
        "  SQRT(loss) AS rmse\n",
        "FROM\n",
        "  ML.TRAINING_INFO(MODEL feat_eng.model_7)"
      ],
      "execution_count": 31,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>training_run</th>\n",
              "      <th>iteration</th>\n",
              "      <th>loss</th>\n",
              "      <th>eval_loss</th>\n",
              "      <th>learning_rate</th>\n",
              "      <th>duration_ms</th>\n",
              "      <th>rmse</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>20.798913</td>\n",
              "      <td>38.816795</td>\n",
              "      <td>0.4</td>\n",
              "      <td>4776</td>\n",
              "      <td>4.560583</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>71.421406</td>\n",
              "      <td>83.045042</td>\n",
              "      <td>0.2</td>\n",
              "      <td>3388</td>\n",
              "      <td>8.451119</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   training_run  iteration       loss  eval_loss  learning_rate  duration_ms  \\\n",
              "0             0          1  20.798913  38.816795            0.4         4776   \n",
              "1             0          0  71.421406  83.045042            0.2         3388   \n",
              "\n",
              "       rmse  \n",
              "0  4.560583  \n",
              "1  8.451119  "
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 31
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "-X1oUFpm1yte",
        "colab": {},
        "outputId": "ef90ec44-eb0f-4201-fbb5-9e29a2c240d3"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  *\n",
        "FROM\n",
        "  ML.EVALUATE(MODEL feat_eng.model_7)"
      ],
      "execution_count": 32,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>mean_absolute_error</th>\n",
              "      <th>mean_squared_error</th>\n",
              "      <th>mean_squared_log_error</th>\n",
              "      <th>median_absolute_error</th>\n",
              "      <th>r2_score</th>\n",
              "      <th>explained_variance</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>3.441163</td>\n",
              "      <td>38.816795</td>\n",
              "      <td>0.132281</td>\n",
              "      <td>2.554729</td>\n",
              "      <td>0.579521</td>\n",
              "      <td>0.58163</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   mean_absolute_error  mean_squared_error  mean_squared_log_error  \\\n",
              "0             3.441163           38.816795                0.132281   \n",
              "\n",
              "   median_absolute_error  r2_score  explained_variance  \n",
              "0               2.554729  0.579521             0.58163  "
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 32
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "PXpHTWhv1ytg",
        "colab": {},
        "outputId": "f15ca9f6-b5aa-4084-ce9f-3b7579f7c7eb"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  SQRT(mean_squared_error) AS rmse\n",
        "FROM\n",
        "  ML.EVALUATE(MODEL feat_eng.model_7)"
      ],
      "execution_count": 33,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>rmse</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>6.230313</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "       rmse\n",
              "0  6.230313"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 33
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "z8PojyIe1ytk"
      },
      "source": [
        "### Final Model:  Apply the TRANSFORM clause and L2 Regularization\n",
        "\n",
        "Before we perform our prediction, we should encapsulate the entire feature set in a TRANSFORM clause.  BigQuery ML now supports defining data transformations during model creation, which will be automatically applied during prediction and evaluation. This is done through the TRANSFORM clause in the existing CREATE MODEL statement. By using the TRANSFORM clause, user specified transforms during training will be automatically applied during model serving (prediction, evaluation, etc.) \n",
        "\n",
        "In our case, we are using the TRANSFORM clause to separate out the raw input data from the TRANSFORMED features.  The input columns of the TRANSFORM clause is the query_expr (AS SELECT part).  The output columns of TRANSFORM from select_list are used in training. These transformed columns are post-processed with standardization for numerics and one-hot encoding for categorical variables by default. \n",
        "\n",
        "The advantage of encapsulating features in the TRANSFORM clause is the client code doing the PREDICT doesn't change, e.g. our model improvement is transparent to client code. Note that the TRANSFORM clause MUST be placed after the CREATE statement.\n",
        "\n",
        "##### [L2 Regularization](https://developers.google.com/machine-learning/glossary/#L2_regularization) \n",
        "Sometimes, the training RMSE is quite reasonable, but the evaluation RMSE illustrate more error. Given the severity of the delta between the EVALUATION RMSE and the TRAINING RMSE, it may be an indication of overfitting. When we do feature crosses, we run into the risk of overfitting (for example, when a particular day-hour combo doesn't have enough taxi rides).\n",
        "\n",
        "Overfitting is a phenomenon that occurs when a machine learning or statistics model is tailored to a particular dataset and is unable to generalize to other datasets. This usually happens in complex models, like deep neural networks.  Regularization is a process of introducing additional information in order to prevent overfitting.\n",
        "\n",
        "Therefore, we will apply L2 Regularization to the final model.  As a reminder, a regression model that uses the L1 regularization technique is called Lasso Regression while a regression model that uses the L2 Regularization technique is called Ridge Regression.  The key difference between these two is the penalty term.  Lasso shrinks the less important feature’s coefficient to zero, thus removing some features altogether.  Ridge regression adds “squared magnitude” of coefficient as a penalty term to the loss function.\n",
        "\n",
        "In other words, L1 limits the size of the coefficients. L1 can yield sparse models (i.e. models with few coefficients); Some coefficients can become zero and eliminated. \n",
        "\n",
        "L2 regularization adds an L2 penalty equal to the square of the magnitude of coefficients. L2 will not yield sparse models and all coefficients are shrunk by the same factor (none are eliminated). \n",
        "\n",
        "The regularization terms are ‘constraints’ by which an optimization algorithm must ‘adhere to’ when minimizing the loss function, apart from having to minimize the error between the true y and the predicted ŷ.  This in turn reduces model complexity, making our model simpler. A simpler model can reduce the chances of overfitting."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "YZuZPtES1ytl"
      },
      "source": [
        "**Lab Task #5:** Apply the TRANSFORM clause and L2 Regularization to the final model."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "rKf_4I771ytn",
        "colab": {},
        "outputId": "c3ed43f9-59de-44e0-b4bb-a35e3e34874e"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "CREATE OR REPLACE MODEL\n",
        "  feat_eng.final_model \n",
        "# TODO 5\n",
        "# TODO -- Your code here.\n",
        "    ML.FEATURE_CROSS(STRUCT(CAST(EXTRACT(DAYOFWEEK\n",
        "          FROM\n",
        "            pickup_datetime) AS STRING) AS dayofweek,\n",
        "        CAST(EXTRACT(HOUR\n",
        "          FROM\n",
        "            pickup_datetime) AS STRING) AS hourofday)) AS day_hr,\n",
        "    CONCAT( ML.BUCKETIZE(pickuplon,\n",
        "        GENERATE_ARRAY(-78, -70, 0.01)), ML.BUCKETIZE(pickuplat,\n",
        "        GENERATE_ARRAY(37, 45, 0.01)), ML.BUCKETIZE(dropofflon,\n",
        "        GENERATE_ARRAY(-78, -70, 0.01)), ML.BUCKETIZE(dropofflat,\n",
        "        GENERATE_ARRAY(37, 45, 0.01)) ) AS pickup_and_dropoff ) OPTIONS(input_label_cols=['fare_amount'],\n",
        "    model_type='linear_reg',\n",
        "    l2_reg=0.1) AS\n",
        "SELECT\n",
        "  *\n",
        "FROM\n",
        "  feat_eng.feateng_training_data"
      ],
      "execution_count": 34,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "Empty DataFrame\n",
              "Columns: []\n",
              "Index: []"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 34
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "3Sb4U38_1yto"
      },
      "source": [
        "Next, we evaluate the final model."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "RqUNost_1ytq",
        "colab": {},
        "outputId": "998edad7-c023-4afb-a820-633a5ed30124"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  *,\n",
        "  SQRT(loss) AS rmse\n",
        "FROM\n",
        "  ML.TRAINING_INFO(MODEL feat_eng.final_model)"
      ],
      "execution_count": 35,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>training_run</th>\n",
              "      <th>iteration</th>\n",
              "      <th>loss</th>\n",
              "      <th>eval_loss</th>\n",
              "      <th>learning_rate</th>\n",
              "      <th>duration_ms</th>\n",
              "      <th>rmse</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>11.543481</td>\n",
              "      <td>29.058845</td>\n",
              "      <td>0.4</td>\n",
              "      <td>4625</td>\n",
              "      <td>3.397570</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>0</td>\n",
              "      <td>2</td>\n",
              "      <td>14.003867</td>\n",
              "      <td>31.897211</td>\n",
              "      <td>0.8</td>\n",
              "      <td>5065</td>\n",
              "      <td>3.742174</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>21.037874</td>\n",
              "      <td>32.961365</td>\n",
              "      <td>0.4</td>\n",
              "      <td>4679</td>\n",
              "      <td>4.586706</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>69.922209</td>\n",
              "      <td>76.298004</td>\n",
              "      <td>0.2</td>\n",
              "      <td>3311</td>\n",
              "      <td>8.361950</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   training_run  iteration       loss  eval_loss  learning_rate  duration_ms  \\\n",
              "0             0          3  11.543481  29.058845            0.4         4625   \n",
              "1             0          2  14.003867  31.897211            0.8         5065   \n",
              "2             0          1  21.037874  32.961365            0.4         4679   \n",
              "3             0          0  69.922209  76.298004            0.2         3311   \n",
              "\n",
              "       rmse  \n",
              "0  3.397570  \n",
              "1  3.742174  \n",
              "2  4.586706  \n",
              "3  8.361950  "
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 35
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "bdQnFs8q1yts",
        "colab": {},
        "outputId": "ba9df792-bb74-49d1-a284-7efdcc8b6ba7"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  *\n",
        "FROM\n",
        "  ML.EVALUATE(MODEL feat_eng.final_model)"
      ],
      "execution_count": 36,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>mean_absolute_error</th>\n",
              "      <th>mean_squared_error</th>\n",
              "      <th>mean_squared_log_error</th>\n",
              "      <th>median_absolute_error</th>\n",
              "      <th>r2_score</th>\n",
              "      <th>explained_variance</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>2.623061</td>\n",
              "      <td>29.058845</td>\n",
              "      <td>0.086064</td>\n",
              "      <td>1.567397</td>\n",
              "      <td>0.673497</td>\n",
              "      <td>0.67393</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   mean_absolute_error  mean_squared_error  mean_squared_log_error  \\\n",
              "0             2.623061           29.058845                0.086064   \n",
              "\n",
              "   median_absolute_error  r2_score  explained_variance  \n",
              "0               1.567397  0.673497             0.67393  "
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 36
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "MhaD7-rI1ytu",
        "colab": {},
        "outputId": "ef5c5b2c-2451-40eb-ee96-81b48de1542f"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  SQRT(mean_squared_error) AS rmse\n",
        "FROM\n",
        "  ML.EVALUATE(MODEL feat_eng.final_model)"
      ],
      "execution_count": 37,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>rmse</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>5.390626</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "       rmse\n",
              "0  5.390626"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 37
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "g5cnCLq72Uu8"
      },
      "source": [
        "### Predictive Model\n",
        "\n",
        "\n",
        "Now that you have evaluated your model, the next step is to use it to predict an outcome. You use your model to predict the taxifare amount. \n",
        "The ML.PREDICT function is used to predict results using your model: feat_eng.final_model.  \n",
        "\n",
        "Since this is a regression model (predicting a continuous numerical value), the best way to see how it performed is to evaluate the difference between the value predicted by the model and the benchmark score. We can do this with an ML.PREDICT query."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wt97FSfvgKmJ",
        "colab_type": "text"
      },
      "source": [
        "Now, apply the ML.PREDICT function."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "-ohaHCFW204X",
        "colab": {},
        "outputId": "b865a56b-af5b-4144-f3f1-75455f2215c8"
      },
      "source": [
        "%%bigquery\n",
        "\n",
        "SELECT\n",
        "  *\n",
        "FROM\n",
        "  ML.PREDICT(MODEL feat_eng.final_model,\n",
        "    (\n",
        "    SELECT\n",
        "      -73.982683 AS pickuplon,\n",
        "      40.742104 AS pickuplat,\n",
        "      -73.983766 AS dropofflon,\n",
        "      40.755174 AS dropofflat,\n",
        "      3.0 AS passengers,\n",
        "      TIMESTAMP('2019-06-03 04:21:29.769443 UTC') AS pickup_datetime ))"
      ],
      "execution_count": 38,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>predicted_fare_amount</th>\n",
              "      <th>pickuplon</th>\n",
              "      <th>pickuplat</th>\n",
              "      <th>dropofflon</th>\n",
              "      <th>dropofflat</th>\n",
              "      <th>passengers</th>\n",
              "      <th>pickup_datetime</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>7.516129</td>\n",
              "      <td>-73.982683</td>\n",
              "      <td>40.742104</td>\n",
              "      <td>-73.983766</td>\n",
              "      <td>40.755174</td>\n",
              "      <td>3.0</td>\n",
              "      <td>2019-06-03 04:21:29.769443+00:00</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   predicted_fare_amount  pickuplon  pickuplat  dropofflon  dropofflat  \\\n",
              "0               7.516129 -73.982683  40.742104  -73.983766   40.755174   \n",
              "\n",
              "   passengers                  pickup_datetime  \n",
              "0         3.0 2019-06-03 04:21:29.769443+00:00  "
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 38
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "aQeF100u3U3Z"
      },
      "source": [
        "### Lab Summary: \n",
        "Our ML problem:  Develop a model to predict taxi fare based on distance -- from one point to another in New York City. \n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "bW4twucJ3c37"
      },
      "source": [
        "#### OPTIONAL Exercise: Create a RMSE summary table."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "aBJGVTQJ2Cp3"
      },
      "source": [
        "Create a RMSE summary table:"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zLDAWbEGgKmL",
        "colab_type": "text"
      },
      "source": [
        "**Solution Table**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "m8qPoEd7gKmM",
        "colab_type": "text"
      },
      "source": [
        "| Model       | Taxi Fare | Description                           |\n",
        "|-------------|-----------|---------------------------------------|\n",
        "| model_4     | 9.65      | --Feature cross categorical features  |\n",
        "| model_5     | 5.58      | --Create a Euclidian feature column   |\n",
        "| model_6     | 5.90      | --Feature cross Geo-location features |\n",
        "| model_7     | 6.23      | --Apply the TRANSFORM Clause          |\n",
        "| final_model | 5.39      | --Apply L2 Regularization             |"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "nAsY468W4BuK"
      },
      "source": [
        "**RUN** the cell to visualize a RMSE bar chart."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "kRNfpJvW3V2n",
        "colab": {},
        "outputId": "3de43107-1b50-4e32-c5aa-79e5fa030557"
      },
      "source": [
        "import matplotlib.pyplot as plt\n",
        "%matplotlib inline\n",
        "plt.style.use('ggplot')\n",
        "\n",
        "x = ['m4', 'm5', 'm6','m7', 'final']\n",
        "RMSE = [9.65,5.58,5.90,6.23,5.39]\n",
        "\n",
        "x_pos = [i for i, _ in enumerate(x)]\n",
        "\n",
        "plt.bar(x_pos, RMSE, color='green')\n",
        "plt.xlabel(\"Model\")\n",
        "plt.ylabel(\"RMSE\")\n",
        "plt.title(\"RMSE Model Summary\")\n",
        "\n",
        "plt.xticks(x_pos, x)\n",
        "\n",
        "plt.show()"
      ],
      "execution_count": 39,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEaCAYAAAD+E0veAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAH59JREFUeJzt3XlUVeXixvHnACIogwoiggNqWqhXzCG1FFTIdXMur1YO5VDmFL9MTUtJ85qLTDMtHMqp1GurrKxrWkleIbFyni2H66ylaDjhEJz9+6N8lycsyCtno+f7WYu13Pu8Z/O8WztPe+9z9nFYlmUJAABJXnYHAAAUHZQCAMCgFAAABqUAADAoBQCAQSkAAAxKASiAefPmycfH5y89Z8yYMbrjjjsKKRFQOCgFGD179pTD4ZDD4ZC3t7cqVKigxx57TEePHnUZ17x5czkcDj377LN5tjFlyhQ5HI48L4Zz585V/fr1FRQUpMDAQEVHR+vJJ580j69atcr87t//LF68ON/MDz30UJ7HPvnkEzkcjr/8Yu5Op06dUmJioqpUqaLixYurbNmyatasmRYtWmR3NHioovtfC2zRrFkzvf/++8rNzdW+ffs0cOBAde7cWWvWrHEZV6lSJc2fP1/Jycny9fU169966y1VrlzZZey8efPUv39/TZw4Ua1atZIk7dq1S5988kme379x40aVL1/eZV3p0qX/NHOlSpW0dOlS/fTTTypXrpxZP3PmTFWuXFlHjhwp2ORt0KlTJ2VlZWnmzJm68847lZmZqe+++06nTp2yO9pNd+XKFZd/KyiaOFKAC19fX4WHhysyMlKxsbHq27evvvnmG509e9ZlXHx8vAICAvTxxx+bdatXr9bhw4fVuXNnl7FLlixR27ZtNWjQINWoUUM1atRQhw4dNGfOnDy/v2zZsgoPD3f5KV68+J9mrl69uho3bqx58+aZdYcOHdKKFSvUq1evPOOXLVum+vXrq3jx4goLC9OAAQN04cIF87jT6VRSUpLCwsIUEBCghx9+WD///HOe7axYsUL33Xef/P39FRkZqV69ev2lF/OsrCylpaVp3LhxatWqlSpXrqz69etrwIABGjRokBnXvHlzPfHEEy7PHTdunKKiosxyz549lZCQoDfeeEMVKlRQQECAnnjiCf3yyy+aMWOGKleurNKlS6tv3766cuWKy7b79OmjUaNGKSwsTKVKldLIkSPldDo1duxYlStXTmXLltXIkSNdfv+//vUvNWrUSMHBwQoNDVWbNm20e/du8/iBAwfkcDi0cOFCtW7dWiVLllRSUpKqVq2q8ePHu2zrwoULCgoK0vz58wu871B4KAX8oWPHjmnx4sXy9vaWt7e3y2NeXl7q06eP3n77bbPurbfeUteuXVWyZEmXseXLl9f69etdXjRutr59+2rWrFm6eteWWbNmKT4+Ps9Ry9atW9W+fXvFxsZqy5Yteuedd7R06VL169fPjHnjjTf02muv6dVXX9XGjRtVv359vfTSSy7bWblypTp06KBHHnlEW7du1ZIlS3TgwAE99NBDKuidYwICAhQYGKhPPvnEpZRu1Nq1a7V+/XqtWLFCixYt0oIFC9S+fXutWbNGn3/+uRYsWKD58+dr9uzZLs9bvHixfvnlF61evVqvvfaaxo8frzZt2uj8+fP6+uuvNXHiRI0fP17Lly83z7l8+bJGjRqljRs3asWKFfL29labNm1cCkeShg8frm7dumn79u3q16+fnnzySc2ePdtlH7333nvy8fHJ8z8TsIkF/Obxxx+3vL29rZIlS1r+/v6WJEuSNWTIEJdxcXFxVp8+fayjR49axYoVs/bt22edPn3a8vf3tzZs2GCNHj3aqlatmhl//Phx67777rMkWZUrV7a6dOlizZw50zp//rwZ85///MeSZJUoUcIqWbKky8/Ro0f/NHN8fLx18eJFq0yZMtbKlSutnJwcKzIy0vrwww+tuXPnWt7e3mZ89+7drYYNG7psY8mSJZbD4bAOHDhgWZZlRUZGWi+88ILLmE6dOrlsJy4uzho+fLjLmIMHD1qSrE2bNlmWZeXZD9fz0UcfWSEhIVaxYsWs+vXrW4mJidZXX3113f19rX/+859W5cqVXfZD2bJlrcuXL5t1rVu3tkJCQqxLly6Zde3bt7c6derksu2YmBiXbdesWdOqXbu2y7o6derk+XdwrVOnTlmSrNWrV1uWZVn79++3JFljx451Gffjjz9axYoVs1asWGHWNW7c2EpMTPzDbcO9OFKAi0aNGmnz5s1au3atkpKS1KRJE40bN+66YyMiItS6dWvNmjVL8+fPV3R0tOrVq5dnXHh4uFavXq2dO3fq+eefV8mSJfXcc8+pdu3aOnHihMvYL774Qps3b3b5ufY6wR/x8/NTjx499Pbbb+uzzz5TTk6O2rVrl2fcjh07FBsb67IuLi5OlmVp586dOnv2rI4ePap7773XZUzTpk1dltetW6fXX39dAQEB5qdmzZqSpD179uSb96oHH3xQR48e1eeff65OnTpp586dio+P18CBAwu8jauio6NdztmHh4frzjvvdDn9Fh4enmefx8TEuCyHh4erTp06edZd+7zNmzfrwQcfVJUqVRQYGKhKlSpJkg4ePOjyvHvuucdluVy5curQoYM5wty+fbu+/fZblzcdwF5caIYLf39/886h2rVra9++fXr66addThNdq2/fvurTp4/KlCmjxMTEP912dHS0oqOj9dRTTykpKUk1atTQ9OnTNXr0aDMmKipKFSpUuKHsffv2Vb169XT48GH16tVLxYoVu6HtFITT6dTw4cPVo0ePPI+Fh4f/pW0VL15cLVu2VMuWLfX8889r3LhxSkpK0rBhwxQVFSUvL688p6R++eWXPNv5/XwdDsd11zmdzv/pednZ2WrVqpWaNm2quXPnmtKuVatWntNHvz+VKEn9+vVT69atlZmZqVmzZqlJkyaqXbt2nnGwB6WAPzVmzBjzQt6gQYM8j//973+Xr6+vDh48qK5duxZ4u1FRUSpRokSe/2v9X9SsWVMNGzZURkaGy0Xna9WqVUvp6eku69LS0uRwOFSrVi0FBQUpMjJSa9asUZs2bcyYjIwMl+c0aNBAO3bsKJTPIURHR0uSTp48qaioKIWFhenYsWMuYzZu3HjTf29B7dq1SydPntTLL79ssq5Zs6bA11JatmypSpUqaebMmZo/f74mTpxYmHHxF1EK+FPVq1dXu3btNHLkSH3xxRd5Hvfy8tL27dvldDoVGBh43W30799f4eHh5sUgMzNTU6ZM0dmzZ9WxY0eXsSdPnszzuYKgoCCVKFGiQHm/+OILXbp0SWXKlLnu48OGDVO9evU0ePBgPfXUUzpw4ICefvppdevWzZwCGTJkiJKSknTXXXepcePG+vTTT5WamuqynbFjx6pVq1Z69tln9dhjjykwMFB79uzRBx98oDfffFP+/v75Zj116pQ6deqkXr16KSYmRqVKldL27dv1/PPPq0qVKqpbt64kKSEhQf3799cHH3ygu+++W4sXL9bXX3+tUqVKFWif3GyVK1dW8eLF9cYbb2jIkCE6cOCARowYIYfDUaDnOxwO9e3bV6NGjZK/v78efvjhQk6Mv4JrCsjXsGHD9OWXX2rVqlXXfTwwMFDBwcF/+Pz7779fGzZs0KOPPqoaNWqodevWOn78uJYtW6b777/fZWy9evVUvnx5l59p06YVOGuJEiX+sBAkqU6dOvr000+Vnp6umJgY9ejRQ23atNGMGTPMmP/7v/9TYmKiBg8erLp16+qbb77Riy++6LKdFi1aaOXKldq6dauaNWumOnXqaPDgwQoMDCzwaauAgADde++9SklJUcuWLRUdHa3ExES1bNlSaWlpZjuPP/64Bg4cqIEDB6pBgwY6fPhwvqfqClNoaKgWLFigFStWqFatWho6dKgmTpwoL6+Cv5z06tVLlmWpW7duBS58uIfDKugxHwDcJDt27FDt2rW1efPmPBe6YS9KAYDbXL58WZmZmerfv7/Onz+vlStX2h0Jv8PpIwBus2jRIlWsWFH79+/X9OnT7Y6D6+BIAQBgcKQAADDc8pbUadOmaePGjQoODtakSZMkSefPn9fkyZN18uRJlS1bVoMHD1ZAQIA74gAA/oBbTh/t3LlTfn5+SklJMaWwYMECBQQEqGPHjlqyZInOnz+v7t27F2h7v/8gT1ESGhqqzMxMu2PYxpPn78lzlzx7/rfC3CMiIgo0zi2nj2rWrJnnKGDdunWKi4uT9Ou9Z9atW+eOKACAP2HbJ5rPnDljvjylVKlSOnPmzB+OTU1NNZ8oTU5OVmhoqFsy3ggfH58ina+wefL8PXnukmfP/3aae5G4zcXVr138IwkJCUpISDDLRfkw7VY4jCxMnjx/T5675NnzvxXmXqROH11PcHCw+Tarn3/+WUFBQXZFAQD8xrZSaNCggdLS0iT9epfKhg0b2hUFAPAbt5w+ev3117Vz506dO3dO/fr1U5cuXdSxY0dNnjxZK1euNG9JBQDYyy2l8Mwzz1x3/e/vPAkAsBefaAYAGJQCAMAoEm9JdZfItyPtjnDTHH3yqN0RANyGOFIAABiUAgDAoBQAAAalAAAwKAUAgEEpAAAMSgEAYFAKAACDUgAAGJQCAMCgFAAABqUAADAoBQCAQSkAAAxKAQBgUAoAAINSAAAYlAIAwKAUAAAGpQAAMCgFAIBBKQAADEoBAGBQCgAAg1IAABiUAgDAoBQAAAalAAAwKAUAgEEpAAAMSgEAYPjYHWDp0qVauXKlHA6HKlasqAEDBsjX19fuWADgkWw9Ujh9+rSWL1+u5ORkTZo0SU6nU2vWrLEzEgB4NNtPHzmdTl25ckW5ubm6cuWKSpcubXckAPBYtp4+KlOmjNq1a6f+/fvL19dXMTExiomJyTMuNTVVqampkqTk5GSFhoa6O2qRU1T3gY+PT5HNVtg8ee6SZ8//dpq7raVw/vx5rVu3TikpKSpRooRee+01paenKzY21mVcQkKCEhISzHJmZqa7oxY5RXUfhIaGFtlshc2T5y559vxvhblHREQUaJytp4+2bdumsLAwBQUFycfHR40aNdLu3bvtjAQAHs3WUggNDdWePXt0+fJlWZalbdu2KTIy0s5IAODRbD19VL16dTVu3FjDhw+Xt7e3oqKiXE4TAQDcy/bPKXTp0kVdunSxOwYAQEXgLakAgKKDUgAAGJQCAMCgFAAABqUAADAoBQCAQSkAAAxKAQBgUAoAAINSAAAYlAIAwKAUAAAGpQAAMCgFAIBBKQAADEoBAGBQCgAAg1IAABiUAgDAoBQAAAalAAAwKAUAgEEpAAAMSgEAYFAKAACDUgAAGJQCAMCgFAAABqUAADAoBQCAQSkAAAxKAQBg+NgdAEDhinw70u4IN83RJ4/aHeG2x5ECAMCw/UjhwoULmjFjhg4fPiyHw6H+/furRo0adscCAI9keynMnTtXdevW1ZAhQ5STk6PLly/bHQkAPJatp4+ys7O1a9cutWzZUpLk4+OjkiVL2hkJADyarUcKJ06cUFBQkKZNm6aDBw+qatWq6tmzp/z8/FzGpaamKjU1VZKUnJys0NBQO+IWKUV1H/j4+BTZbIXNk+fuLkV1/95Of/e2lkJubq7279+v3r17q3r16po7d66WLFmiRx55xGVcQkKCEhISzHJmZqa7oxY5RXUfhIaGFtlshc2T5+4uRXX/3gp/9xEREQUaZ2sphISEKCQkRNWrV5ckNW7cWEuWLLEzEm5TvC0TKBhbrymUKlVKISEhOnbsmCRp27ZtqlChgp2RAMCj2f7uo969e2vq1KnKyclRWFiYBgwYYHckAPBYtpdCVFSUkpOT7Y4BABCfaAYAXINSAAAY+ZbCnDlzXJZXrlzpsjxx4sSbmwgAYJt8rymkpaWpd+/eZnn+/PnmE8jSr+8YQtHHWzIBFES+RwqWZbkjBwCgCMi3FBwOhztyAACKgHxPH+Xm5mr79u1m2el05lkGANwe8i2F4OBgTZ8+3SwHBAS4LAcFBRVOMgC4CW6X62nuupaWbymkpKS4IwcAoAi4oc8pHDt2TGvXrtXJkydvdh4AgI3yPVJ45513VKVKFcXGxkr69S2q06dPV8mSJXXp0iUNHTpUd999d6EHBQAUvnyPFNatW6eaNWua5UWLFqlXr16aPXu2nnzySS1evLhQAwIA3CffUjh37pz5RqFDhw7p3Llz5sNrsbGx5rbXAIBbX76lUKJECWVlZUmSvv/+e1WrVk3FihWTJOXk5BRuOgCAW+V7TaFJkyaaMmWKGjZsqKVLl6pjx47msb1796pcuXKFGhAA4D75Hil07dpVNWvW1NatW/N8V/KBAwdclgEAt7Z8jxR8fHzUuXPn6z7WunXrmx4IAGCfAt0lNT9xcXE3JQwAwF75lsK0adMUHh6uUqVKXfeOqQ6Hg1IAgNtEvqXwwAMP6Ntvv5Wfn5/i4uLUsGFD8+4jAMDtJd9S6Nmzpx577DFt3rxZaWlpmjdvnurVq6fmzZvrrrvuckdGAICbFOjeR15eXqpXr54GDx6s119/XQEBARozZozLLbQBALe+fI8UrsrOzlZGRobS0tJ09uxZderUSVFRUYUYDQDgbvmWwvr165Wenq7vv/9eDRo0UPfu3TltBAC3qXxL4dVXX1VERISaNWsmX19fbdmyRVu2bHEZ8/DDDxdaQACA++RbCrGxsXI4HDp37pw78gAAbJRvKQwcOPAPHzt48KA+/PDDmxoIAGCffEvh8uXL+vjjj3XgwAGVL19enTt31rlz5/Tuu+9q27Zt5st3AAC3vnxLYfbs2dq/f79iYmK0efNmHTp0SMeOHVNcXJyeeuopBQUFuSMnAMAN8i2FLVu2aMKECQoODtYDDzygAQMGaMyYMYqOjnZHPgCAG+X74bVLly4pODhYkhQSEiI/Pz8KAQBuU/keKeTm5ub55PLvl2vXrn1zUwEAbJFvKQQHB2v69OlmOSAgwGXZ4XDozTffLJx0AAC3yrcUUlJS3JEDAFAEFOiGeIXN6XTqueeeU3Jyst1RAMCjFYlSWLZsmSIjI+2OAQAez/ZSOHXqlDZu3Kj4+Hi7owCAxyvwrbMLy7x589S9e3ddvHjxD8ekpqYqNTVVkpScnKzQ0FB3xSuyPHkfePLcJc+eP3MvfLaWwoYNGxQcHKyqVatqx44dfzguISFBCQkJZjkzM9Md8Yo0T94Hnjx3ybPnz9xvXERERIHG2VoKP/zwg9avX69NmzbpypUrunjxoqZOnarExEQ7YwGAx7K1FLp27aquXbtKknbs2KF///vfFAIA2Mj2C80AgKLD9gvNV9WqVUu1atWyOwYAeDSOFAAABqUAADAoBQCAQSkAAAxKAQBgUAoAAINSAAAYlAIAwKAUAAAGpQAAMCgFAIBBKQAADEoBAGBQCgAAg1IAABiUAgDAoBQAAAalAAAwKAUAgEEpAAAMSgEAYFAKAACDUgAAGJQCAMCgFAAABqUAADAoBQCAQSkAAAxKAQBgUAoAAINSAAAYlAIAwKAUAAAGpQAAMHzs/OWZmZlKSUlRVlaWHA6HEhIS1Lp1azsjAYBHs7UUvL291aNHD1WtWlUXL17UiBEjVKdOHVWoUMHOWADgsWw9fVS6dGlVrVpVkuTv76/IyEidPn3azkgA4NFsPVK41okTJ7R//37dcccdeR5LTU1VamqqJCk5OVmhoaHujlfkePI+8OS5S549f+Ze+IpEKVy6dEmTJk1Sz549VaJEiTyPJyQkKCEhwSxnZma6M16R5Mn7wJPnLnn2/Jn7jYuIiCjQONvffZSTk6NJkyapWbNmatSokd1xAMCj2VoKlmVpxowZioyMVNu2be2MAgCQzaePfvjhB6Wnp6tSpUoaNmyYJOnRRx9VvXr17IwFAB7L1lK466679P7779sZAQBwDduvKQAAig5KAQBgUAoAAINSAAAYlAIAwKAUAAAGpQAAMCgFAIBBKQAADEoBAGBQCgAAg1IAABiUAgDAoBQAAAalAAAwKAUAgEEpAAAMSgEAYFAKAACDUgAAGJQCAMCgFAAABqUAADAoBQCAQSkAAAxKAQBgUAoAAINSAAAYlAIAwKAUAAAGpQAAMCgFAIBBKQAADEoBAGD42B1g8+bNmjt3rpxOp+Lj49WxY0e7IwGAx7L1SMHpdGr27Nl64YUXNHnyZGVkZOjIkSN2RgIAj2ZrKezdu1fh4eEqV66cfHx8dO+992rdunV2RgIAj2br6aPTp08rJCTELIeEhGjPnj15xqWmpio1NVWSlJycrIiIiBv6fdZo68aC3gY8ee6SZ8/fk+cuMf+/6pa40JyQkKDk5GQlJyfbHSVfI0aMsDuCrTx5/p48d8mz5387zd3WUihTpoxOnTpllk+dOqUyZcrYmAgAPJutpVCtWjUdP35cJ06cUE5OjtasWaMGDRrYGQkAPJqt1xS8vb3Vu3dvvfzyy3I6nWrRooUqVqxoZ6T/WUJCgt0RbOXJ8/fkuUuePf/bae4Oy7K4CgMAkHSLXGgGALgHpQAAMGy/zcXtbu/evRo1apSeeeYZNW7c2O44brNjxw5NmDBBYWFhkqRGjRrpH//4h82p3GfHjh2aN2+ecnNzFRgYqJdeesnuSG7z6aef6uuvv5b0610Ljhw5otmzZysgIMDmZH/dsmXLtGLFCmVlZalDhw43fBueHj16aP78+Tc5XeGgFAqR0+nUwoULFRMTY3cUW0RHR99W798uqAsXLmjWrFkaOXKkQkNDdebMGbsjuVX79u3Vvn17SdL69ev12Wef3ZKFIElffvmlkpKSXD5ke7ujFG7AiRMnNH78eFWvXl27d+9WtWrV1Lx5c33wwQc6c+aMEhMTdccdd2j58uVq1KiR9u3bZ3fkm6og879dFWTu+/btU6NGjRQaGipJCg4Otjn1zVPQf/tXZWRk6L777rMx8Y1766239NNPP2n8+PFq0aKFfvrpJ/Xp00cpKSny9/fXf//7X2VlZal79+5q3LixLl26pAkTJujChQvKycnRI488ooYNG9o9jb+Mawo36Mcff1S7du00efJkHT16VKtXr9bYsWPVo0cPffTRRzp9+rTWrl2rVq1a2R21UOQ3f0navXu3hg0bpvHjx+vw4cM2J7558pv78ePHdf78eY0ZM0bDhw9XWlqa3ZFvqoL83UvS5cuXtXnz5lv2tGnfvn1VpkwZjR49Os+RTlZWlsaOHasRI0Zo4cKFkqRixYpp6NCheuWVVzR69Gi9++67uhXf3MmRwg0KCwtTpUqVJEkVK1bU3/72NzkcDlWqVEknT57UvHnz1K1bN3l53Z69m9/8q1SpomnTpsnPz08bN27Uq6++qqlTp9qc+ubIb+4hISHav3+/kpKSdOXKFY0aNUrVq1e/4Xt2FTX5zf+qDRs26M4777xlTx39mYYNG8rLy0sVKlQwpwcty9KiRYu0a9cuORwOnT59WmfOnFGpUqVsTvvXUAo3qFixYubPDofDLDscDjmdTu3bt09TpkyRJJ09e1abNm2Sl5eX7rnnHlvy3mz5zb9EiRLm8Xr16mn27Nk6e/asgoKC3J71Zstv7iEhIQoMDJSfn5/8/PwUHR2tgwcP3jalkN/8r8rIyFDTpk3dns8drt0HV48GVq9erbNnzyo5OVk+Pj4aOHCgrly5YlfEG0YpFJKUlBSXP9evX/+2KYSCyMrKUnBwsBwOh/bu3Sun06nAwEC7Y7lFgwYNNGfOHOXm5ionJ0d79+5VmzZt7I7lVtnZ2dq5c6eefvppu6O4TXZ2toKDg+Xj46Pt27e7HDXdSigFFIpvv/1WX375pby9veXr66tnnnlGDofD7lhuUaFCBdWtW1dDhw6Vl5eXWrZsaU63eIq1a9cqJiZGfn5+dkdxm6ZNm+qVV17RkCFDVK1aNUVGRtod6YZwmwsAgHF7XgUFANwQSgEAYFAKAACDUgAAGJQCAMCgFIBCcOLECXXp0kW5ubn5jl21apWSkpLckArIH6UASBo4cKAeffRRnT171mX9c889py5duujEiRM2JQPci1IAfhMWFqaMjAyzfOjQIV2+fNnGRID78Ylm4DexsbFKT0/XAw88IOnX0zpxcXF67733JP16G4M5c+Zo06ZNKl68uOLj4/Xggw/Ky8tLTqdTCxYsUFpamvz9/dW2bVuXbWdnZ+udd97Rpk2b5HA41KJFC3Xp0uW2vWEibl38iwR+U716dWVnZ+vIkSNyOp1as2aNmjVrZh6fM2eOsrOz9eabb2rMmDFKT0/XqlWrJEmpqanauHGjXnnlFSUnJ+u7775z2XZKSoq8vb01depUTZgwQVu2bNFXX33lzukBBUIpANe4erSwdetWRUZGqkyZMpJ+/Ra9jIwMde3aVf7+/goLC1Pbtm2Vnp4uSfrmm2/UunVrhYaGKiAgwOVrG7OysrRp0yb17NlTfn5+Cg4OVps2bbRmzRpb5gj8GU4fAdeIjY3V6NGjdeLECcXFxZn1586dU25urvk2NUkqW7asTp8+LUn6+eef8zx2VWZmpnJzc9W3b1+zzrIsj/qKR9w6KAXgGmXLllVYWJg2bdqkfv36mfWBgYHy9vZWZmamKlSoIOnXF/urRxKlS5dWZmamGX/tn0NCQuTj46PZs2fL29vbTTMBbgynj4Df6devn1588UWX2z57eXmpSZMmWrRokS5evKiTJ09q6dKl5ppDkyZNtHz5cp06dUrnz5/XkiVLzHNLly6tmJgYvfvuu8rOzpbT6dSPP/6onTt3un1uQH44UgB+Jzw8/Lrre/furTlz5mjQoEHy9fVVfHy8WrRoIUmKj4/XsWPHNGzYMPn7+6tdu3bavn27ee6gQYO0cOFCPfvss7p48aLKlSunDh06uGU+wF/B9ykAAAxOHwEADEoBAGBQCgAAg1IAABiUAgDAoBQAAAalAAAwKAUAgPH/jLlzELhZnKMAAAAASUVORK5CYII=\n",
            "text/plain": [
              "<Figure size 432x288 with 1 Axes>"
            ]
          },
          "metadata": {
            "tags": []
          }
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YfDZh0QZgKmQ",
        "colab_type": "text"
      },
      "source": [
        "Copyright 2022 Google Inc.\n",
        "Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at\n",
        "http://www.apache.org/licenses/LICENSE-2.0\n",
        "Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
      ]
    }
  ]
}
